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Abstract 

Leukocytes, which are created in the bone marrow comprise one percent of all 
blood cells. When these white blood cells grow uncontrollably it gives rise, to 
the development of blood cancer. The proposed research presents an approach, 
for categorizing One of the three kinds of Multiple Myeloma (MM) and Acute 
Lymphoblastic Leukaemia (ALL) are the two diseases that make use of the SN- 
AM dataset. the malignancy known as acute lymphoblastic leukaemia (ALL), 
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than their release into the bloodstream. Hence, the growth of blood cells is 
to be resist and prevent. Beforehand, the procedure was carried out manually 
evaluated by experienced haematologists. The proposed methodology totally 
eliminates the chance of human mistake through using deep learning meth- 


ods, particularly convolutional neural networks. A total of 89 ALL patients 
3256 smears of peripheral blood (PBS) pictures were acquired from an online 
portal. The model undergoes training using modified convolutional neural net- 
works that has been optimized and its ability to predict which type of malig- 
nancy is present in the cells is determined. In 96 out of 100 cases, the algo- 
rithm strongly replicated every measurement that corresponded to the samples. 
The accuracy of the system was found to be 97.6%, which is more appropri- 
ate than modern techniques like Decision Trees, Random Forests, Naive Bayes, 
and Support Vector Machines (SVMs), VGG16, VGG19, AlexNet, Google-Net, 
Mobile-NetV2. The work showcases that Modified CNN performs more accu- 
rately. 


1. Introduction cells. Leukaemia, myeloma, and lymphoma make 
up the bulk of instances of blood cancer. White 
blood cell cancer of the acute lymphocytic kind 
(ALL) affects the bone marrow. In medical treat- 
ments, the disease just begun along with its lim- 
ited duration are indicated by the term ’Acute”. 


ALL-classifying approach based on CNN likewise, 


The three cell types that make up blood are platelets, 
red blood cells, & white blood cells. These cells 
are continuously produced and put into the blood- 
stream in the bone marrow. Blood cancer usually 
comes on by the atypical blood cells’ rapid prolif- 
eration, which inhibits the creation of healthy blood 
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first and final layer features are integrated to pre- 
vent overfitting, a dropout layer is utilised (Banik, 
Saha, and Kim). WBCs are divided into the 
L1, L2, and L3 subtypes of ALL (Shafique and 
Tehsin). Leukaemia is divided into chronic and 
acute types. The benefits of transfer learning-based 
extracting features utilising three modified ResNet 
models, the best sensitivity (100%) was attained 
but it takes time and is prone to error to manually 
detect a haematological disease (Das, Pradhan, and 
Meher). WBCsNet is a brand-new CNN-based clas- 
sification system for WBC. The five distinct kinds 
of white blood cells (WBCs)—monocyte, lympho- 
cyte, basophil, eosinophil, & neutrophil—are cat- 
egorised using this method using three deep learn- 
ing algorithms (I Shahin et al.). Using ten cutting- 
edge architectures, binary cell categorization of 
healthy versus malignant cells was done in com- 
parison (Gehlot, A. Gupta, and R. Gupta). Due 
to the synergistic effects of SVM-based classifi- 
cation and MobileNetV2-based feature extraction, 
it exhibits promising performance. We also note 
that the proposed hybrid model yields a second-best 
overall performance with an accuracy of 97.18%. 
Additionally, among the ALLIDB1 & ALLIDB2 
datasets, it obtains the best accuracy, achieving 
97.92% and 96.00%, respectively, with 50% trained 
and 50% testing (Das and Meher). The cytoplasm 
and nucleus have been segmented, and features have 
been retrieved based on form and textural signals. 
On various combinations of set features, different 
classifiers have been investigated. Trials using nor- 
mal cells served as the foundation for the findings 
given here. SVM demonstrated the greatest effi- 
ciency: 92% (Laosai and Chamnongthai). The C- 
NMC dataset is used to validate the proposed tech- 
nique, which features hard elements that make ALL 
detection more difficult. The similarity in morphol- 
ogy between ALL and healthy pictures (R. Gupta, 
Gehlot, and A. Gupta). The automatic approach 
of classification is economical and could be easily 
implemented in rural as well as urban places. The 
suggested method has issues with errors brought by 
human laborious classification, the need for a quali- 
fied professional, and cells that are difficult to differ- 
entiate when seen under a microscope. It is made to 
roughly cope the input, dealing only with input that 
is similar to the training data. This model is intended 
to prioritise the copying of the specific input ele- 
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ments. Typically, an automated encoder picks up 
important data attributes (Goodfellow, Bengio, and 
Courville). The segmented image is then used 
to extract form, texture, and tone properties and 
using an RBF kernel and an SVM (support vec- 
tor machine), the classification of WBCs is fin- 
ished. The accuracy and sensitivity of the suggested 
approach are both 96.00%, which produces encour- 
aging results [10]. CNNs operate as feature extrac- 
tors as every single layer of convolution of the neural 
network identifies a new feature that appears in the 
images and causes a high activation. An aggressive 
and reliable automatic categorization approach for 
the kind of ALL blood cancer utilising Convolution 
Neural Network. Therefore, the article assesses how 
well the deep learning model that was suggested 
to perform using as comparison such as recall, sensi- 
tivity, specificity, and accuracy. Predicting the kind 
of cancer in the dataset using the given model is the 
article’s primary concept. The model uses fewer 
computations and trainable parameters to classify 
the kind of cancer than present machine learning and 
learned deep learning models. 


2. Related works 


The three steps of the affected blood cell analy- 
sis approach typically consist of feature extraction, 
classification, and quantification. Numerous investi- 
gations on various cancers, such as leukaemia, lym- 
phoma, and myeloma, have been carried out. Test 
of ALL affected blood cells are classified by various 
algorithms CNN, FNN, SVM, and KNN. CNN pro- 
duces accuracy of 98.33% with 8-layer framework 
whereas other are less than 95% (Rajpurohit et al.) 
. CNN faced problems with large hyperparameter 
tuning and the author used optimised CNN method 
to achieve high accuracy (Tuba et al.) . The average 
ML method’s leukaemia detection accuracy in PBS 
imaging analysis was 97% (Ghaderzadeh, Asadi, 
and Hosseini) . In order to categorise acute myeloid 
leukaemia, the dataset was initially expanded by 
making many changes. The final step employs 
a 7-layer convolutional neural network (Thanh et 
al.) . A neural networks (NNs) classifier employ- 
ing the Bayes regularisation (BR) approach was 
employed to categorise the ALL, and it attained 
an accuracy of 98.7% (Bhuiyan et al.) . With the 
aid of image processing techniques, the author uses 
SVM to analyse the many forms of blood cancer 
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using blood smear images of normal and cancer- 
ous individuals (Elrefaie, Marzouk, and Mohamed) 
. When MobileNet used for training KNN obtain 
95% (Abhishek et al.) . Leukocytes, or white blood 
cells, are divided into two categories using a two- 
stage colour segmentation technique such as shape 
and texture of the nucleus are utilised by SVM for 
final detection (Mohapatra and Patra) . 90% of the 
dataset was used for training, and 10% was used 
for testing (Sakib et al.) . With an overall accu- 
racy of 93.7%, the author was able to classify the 
cells into normal and blast using the SVM classi- 
fier technique on the ALL-IDB-1 dataset (Shafique 
et al.) . A dual branched architecture is created using 
this projection loss and the loss of cross-entropy 
to boost performance and give room for address- 
ing the label noise issue. A symmetrical accuracy 
of 94.17% is the best that the proposed design can 
achieved (Shiv, Anubha, and Ritu) . The dual- 
module deep learning ALL classification framework 
was put forward by the author. Compact CNN is 
used in one module to serve as the primary classi- 
fier, and kernel SVM is used in another module to 
serve as an auxiliary classifier (Shiv, A. Gupta, and 
R. Gupta) . This dataset’s annotations were gen- 
erated automatically during the generation process. 
Author used the dataset to train a deep neural net- 
work, which, when measured compared to the well- 
known ALL-IDB dataset, obtained an outstanding 
precision score of 98.72% (Al-Qudah and Suen) . 
Alert-Net, a deep learning network. It has a soft- 
max, two fully linked layers, and five layers of con- 
volution.2,415 images from 16 datasets were used in 
the studies, and the accuracy was 97.18% (Claro et 
al.) . To differentiate between malignant and benign 
tissue, use the ensemble classifier (the combinations 
of MLP, KNN, & SVM classifiers). But the price 
of computation has also gone up (Mohapatra, Patra, 
and Satpathy) . For a successful ALL classifica- 
tion, significant geometrical, colour, & statistical 
texture features are extracted and HRC-NNs were 
applied to categorise WBCs (Su, Cheng, and Wang) 
. WBC segmentation will be more accurate owing 
to a semantic segregation method based on learn- 
ing through transfer. They used DeepLabV3+ and to 
classify AlexNet with 5 layers (Reena and Ameer) . 
To efficiently classify ALL using ANN and used the 
segmentation method based on k-medoids to sepa- 
rate the cytoplasm from the nucleus. Despite being 
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slower than k-means, the k-medoid technique deliv- 
ers superior segmentation performance and is more 
stable than k-means (Acharya and Kumar) . 


3. Proposed Methodology 


The model, which has five layers and includes three 
convolutional layers and hidden layers, is trained on 
the training set before being used to generate predic- 
tions on the testing set. 


3.1. Dataset Descriptions 


The dataset includes 3256 pictures of peripheral 
blood smears (PBS) taken from 89 patients with 
ALL. The categories of benign and malignant data 
were spilt from this data collection. The former is 
made up of haematogones, while the latter is made 
up of the ALL group, which contains all four cancer- 
ous lymphoid subgroups Early Pre-B, Benign, Pre- 
B, & Pro-B ALL as shown in fig.1. A Zeiss micro- 
scope with a 100x magnification was used to cap- 
ture each image, which was then stored as a ZIP file, 
these files are converted into folders for the process. 
The technique of flow cytometry was used to deter- 
mine the precise types and subtypes of these cells. 
Following segmentation using colour thresholding 
in the HSV colour space, segmented images are pro- 
vided. This dataset was already augmented. 


3.2. Pre-processing and Feature Extraction 


In pre-processing, training data and img_size=50, 
are in following as an array, variable is declared 
respectively. Image values are stored as an array by 
using numpy and matplotlib libraries. The dataset 
consist of images, hence comes under opencv2. 
Firstly, the image has converted into grayscale. 
Afterward, reduction of noise was done by using 
medianblur and the images in dataset are in different 
size, to make it common resize was used as mention 
above as 50. The values are converted into matrices 
and append to the training data for all the images 
presented in the dataset. In addition to, that data has 
been stored randomly for better efficiency. X, y vari- 
able has declared for feature extracting. The cap- 
tured values from image such as shape, size, colour, 
texture, etc... are known as features that are stored 
in X as an array. Whereas, the dataset consists of 
4 categories like benign, early, pre, and pro. For 
classifying these categories, the values are stored as 
labels in y variable. Pickling is a little language that 
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FIGURE 1. (a) Benign (b) Early (c) Pre-B (d) Pro-B 
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FIGURE 2. Process of the dataset 


may be used to translate a Python object’s impor- 
tant state into a string that specifically identifies the 
object. In this case, X.pickle was created to dump 
X(features) that use mb to hold picture values .simi- 
larly, Y.pickle was created to write y(labels) uses kb. 
Hence, the values store only 0,1... Pre-processing 
along with feature extraction were completed in the 
proposed investigation, and the data were then ran- 
domly separated into sets for training and testing. 
The dataset is split into 20% for model testing and 
80% for training as shown in Fig.2 and 3. 


3.3. Model Development 


A modified CNN model has been used in the 
planned study to classify ALL cancers into benign, 
early, pre, and advanced stages. CNNs are the 
computational units of image classification algo- 
rithms. They classify photos quickly and accu- 
rately. Compared to other picture categorization 
techniques, they use less pre-processing. A mod- 


ified CNN model has two hidden layers and three 
convolutional layers. The model presented in this 
article accepts a picture as input and predict the type 
of cancer as shown in fig.5. 


3.3.1. Convolution layer 


A layer of convolution, that is the first layer into 
which the image is fed, is made up of neurons that 
serve as extraction of features units. An activation 
map is created by convolving an input image with a 
k by k (3x3) matrix filter. The amount by which a 
filter shifts the image is referred to as a stride.” A 
convolution process with a kernel of size k, padding 
p, & stride s yields an output of size for an input 
image of size, a x b. 

(a—k+2p)/s+1x (b—k+2p)/s +1. 

Three convolution layers are present as input layer 
in the suggested model, with ReLU serving as their 
activation function and Maxpooling coming after. 
Every first layer has 32 features map whereas, other 
two layer are 64. The standard function of ReLU 
Equation (2) defines, while Equation (3) provides 
the Maxpooling2D. Equation (4) provides the acti- 
vation map that was created after applying the kernel 
function’s convolution operation to the input image. 


f (x) = mazx(0, x) 
where x is the input of neuron, if a negative value 
is supplied, the function returns 0, and if a positive 
value is input, The function will give the same posi- 


tive result back. ReLu involves addressing the van- 
ishing gradient issue 


(a—k)/(s+1) x (b—k)/(s +1) 


Where the maximum value in a certain area on an 
image of size a x b is specified using a kernel of size 
k & astride of size s as shown fig.4. 
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FIGURE 3. Block diagram 
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where i and j are both the rows and columns of 
an input image matrix, I is the image being input, 
f is the kernel’s function, x and y are the rows & 
columns of the resultant matrix, respectively. 


3.3.2. Hidden layer 


The network’s inputs are subjected to nonlinear 
changes by the hidden layers. The goal of every 
layer and the function performed by each neuron in 
it varies based on the function of a neural network, 
which is similar to how hidden layers do. These are 
utilised to perform the necessary activation function, 
add the bias, and compute the weighted average of 
the inputs and weights. There are two hidden lay- 
ers and one output layer in this model. By using, 
flatten the matrix is converted into single dimen- 
sional array by training top 128 values and ReLU 
activation function is used in two hidden layers in 
equation (2). The function that activates softmax is 
present in the output layer. the default softmax fea- 
tures : R” — R” is defined in equation (5). 
ce 
° (2s) 7 ae evs 
fori=1...,nand x = (2,...,¢%,) € R", where n 
is the total number of elements of the input vector x 
and x; represents each element of the input vector x. 
To classify the categories softmax activation func- 
tion is used. while processing 0 and 1. sometimes, 
decimal number may arise therefore, this activation 
will round-off the values. 


3.4. Performance Analysis 


Actually, the model carried out CPU and the cate- 
gorization model is created with Keras and Tensor- 
Flow. 30 training epochs of a binary classification 
model were trained on 2604 images. similarly, 652 
images were also taken for the testing phase. Hyper- 
parameter tuning was done to increase batch size of 
64. The loss function is optimised for each epoch 
using the Adam Optimizer, producing the lowest 
loss at the final epoch. The type of cancer in images 
was then predicted using the training model. The 
outcomes of the suggested model are described first. 
It is also described how the proposed approach com- 
pares and contrasts with cutting-edge deep learning 
and state of art methods. 


3.5. Desired Output 


The network in the modified CNN is trained using 
Adam Optimizer. Decrease function a sparse cate- 
gorical cross-entropy loss equation is utilised, which 
uses a single integer for each class rather than a 
whole vector that is optimised by the Adam opti- 
mizer. This reduces computation and memory 
requirements. The model’s sensitivity is in the per- 
centage of 96.9%. The following criteria have been 
used for comparison: The following equations deter- 
mine accuracy, precision, recall, specificity, and F1- 
score: 


AC =(TP+TN)/(TP+TN+FP+FN) 


P=TP/(TP+FP) 


R=TP/(TP+FN) 
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FIGURE 4. Maxpooling Function 
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FIGURE 5. Modified CNN Architecture 


S=TN/(TN + FP) 


F=(2%P*R)/(P+ 8) 


The proposed model’s accuracy, precision, recall, 
specificity and Fl-score are in the percent of 97.6%, 
0.98, 0.98, 95.3% and 0.98 respectively. 

By comparing, with state of art methods, deep 
learning performs better. The two characteristics, 
Discrete Fourier Transform (DFT) and histogram, 
are taken from the images by the machine learn- 
ing methods. Then, these features are used to train 
all classifiers. The classification model was created 
using the "RBF’ kernel and the supervised learn- 
ing algorithm SVM (Suykens and Vandewalle). A 
probabilistic classifier called Naive Bayes employs 
the Gaussian method to distinguish between the two 
forms of cancer (Rish). Predicting the value of a 
parameter from the input sequence of the gener- 
ated feature vector is the objective of the choice 


tree classifier system (Ben-Haim and Tom-Tov). An 
ensemble learning approach called random forests 
outputs the mean forecast of each individual deci- 
sion tree that was created at the back end, provid- 
ing the final difference (Liaw and Wiener). Only 
three layers are present in the VGG-16 convolu- 
tional neural network (CNN) model, and they are 
stacked on top of one another. The model used 
with Softmax for classification consists of two lay- 
ers that are completely linked with 4096 nodes (Yu 
et al.). Due to the advantages of MobileNetV2 and 
ResNet18 combined, this new hybrid ALL detec- 
tion model performs admirably [10]. Ensemblel 
(an ensemble of KNN, decision tree, SVM, and and 
Naive Bayes classifiers), Ensemble2 (an ensemble 
of linear, Gaussian radial basis, multi-layer percep- 
tron, as well as quadratic SVM kernels), and Ensem- 
ble3 were created (a classifier constructed using the 
quadratic SVM kernel)—are two ensemble classi- 
fiers (Moshavash, Danyali, and Helfroush). The 
table.1 shows the comparisons 
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4. Results and Discussion 


652 photographs are taken for testing or validation, 
while 2604 images are taken for training. The find- 
ings will appear in a graph when 30 epochs of train- 
ing are complete, increasing both simultaneously. 
As aresult, the blue line displays the training picture 
accuracy, while the orange line displays the testing 
image accuracy. The training loss compared. valida- 
tion loss across the number of epochs plot is similar 
in that the erroneous prediction is reduced to 0.2 as 
shown in Fig.6. Applying the modified CNN model, 
the confusion matrix for malignant white blood can- 
cer’s binary classification. The matrix approach is 
a reliable and popular method for evaluating the 
efficacy of a model of classification since it shows 
where the approach has failed and offers guidance 
on how to turn things around. Confusion matrix val- 
ues are returned by the Sklearn.confusion_matrix() 
function. The results, however, deviate slightly from 
previously researched. It uses the columns as Pre- 
dicted Label and the rows as True Label. The 
remainder of the idea is unchanged as shown in fig.7. 


Training and Validation Accuracy Training and Validation Loss 


— Training Loss 
124 — Validation Loss 


— Training Accuracy 
— Validation Accuracy 


0 5 10 15 20 3 30 0 5 10 15 20 25 30 


FIGURE 6. Accuracy and loss graph of training 
and validation 


5. Conclusion 


Before being built with a modified Convolutional 
neural network (CNN) structure, the model first 
pre-forms the images and separates their best fea- 
tures. Finally, it identifies the kind of malignant 
tumour in the provided image. 97.6% was the 
model accuracy evaluation. Additionally, a compar- 
ison of a number of cutting-edge methods, such as 
Support Vector Machines (SVMs), Decision Trees, 
Random Forests, Naive Bayes, VGG-16, Hybrid - 
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FIGURE 7. Confusion Matrix 
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FIGURE 8. Test image 


MobileNetV2 & ResNet18, El E2, etc., was con- 
ducted and presented. The suggested model out- 
performed these approaches. A direct compari- 
son of the model to some already-proposed mod- 
els. The model’s accuracy increased. As a result, to 
accurately determine the type of cancerous tumour 
present in the bone marrow, the model can be 
utilised as a tool. However, we must acknowledge 
that a more extensive experimental investigation tak- 
ing into account the dependence on database size has 
not been carried out and provided here. 
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TABLE 1. Comparison between state ofart algorithms and Modified CNN 


S.NO ALGORITHM ACCU- 
RACY 

1 SVM 73.02 

2 VGG-16 90.1 

3. NAIVE BAYES 74.6 

4. DECISION TREES 96.77 

5 RANDOM FOREST 96.83 

6 HYBRID - MOBILENETV2 & 97.18 

RESNET18 

1. El 75.00 

8. E2 89.81 

9. MODIFIED CNN 97.69 
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PRECI- RECALL SPECI- Fl- 
SION FICITY SCORE 
89.47 53.12 65.9 66.66 
84.88 93.58 87.5 89.01 
69.05 90.65 85.71 1&3) 
94.11 100 100 96.96 
100 93.75 93.93 96.77 
98.52 - 98.46 0.97 
64.47 - 54.24 0.78 
81.67 - 81.36 0.89 
0.98 0.98 95.3 0.98 


TABLE 2. Classification Report 
PRECISION RECALL F1-SCORE 


CLASS 1 0.97 
CLASS 2 0.94 
CLASS 3 1.00 
CLASS 4 1.00 
ACCURACY 

MACRO AVG 0.98 


WEIGHTED AVG 0.98 


SUPPORT 

0.91 0.94 106 
0.98 0.96 186 
0.99 0.99 203 
1.00 1.00 157 

0.98 652 
0.97 0.97 652 
0.98 0.98 652 


Authors’ Note: 


I want to sincerely thank my guide for support- 
ing me as I worked on this project. Without the 
assistance of medical reports of patients, it would 
not have been feasible to detect acute leukaemia 
using deep learning algorithms. Firstly, the med- 
ical information and insights have advanced our 
knowledge of acute leukaemia identification greatly. 
The creation of the deep learning models as well 
as the utilisation of the required computational 
resources were both greatly aided by this concept 
and the project’s support. The project’s direction 
has been heavily shaped by the guide input, which 
has greatly enhanced this work’s success. The result 
of uncountable hours of laborious labour is this 
research report. My research is intended to further 
the early diagnosis and better management of acute 
leukaemia, ultimately improving the quality of life 
for people around the world. 
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